****************************************************************
Explanation of the files associated with the ebaymotors project.

Only the final data file (ebaydatafinal.dta) and the do file used in
the analysis (empirics.do) are posted here ; but all files
and code are available from me on request.

Greg Lewis, Dec 28th 2009
****************************************************************

A. Data Files

There are three sets of data, in three separate directories.  They are:

(i) rawdata

	contains the data downloaded from the eBay completed listings in 2006
	split into a number of directories
		
		  - standard = listings that terminated as usual, auction-style
		  - offers = listings in classified style, with offers rather than auction
		  - buyitnow = fixed price listings
		  - earlysale = item was sold early
		  - withdrawnerror = item was withdrawn
		  - notforsale = item is no longer for sale
		  - other = other source of error
	
(ii) intermediatedata, containing intermediate data files

	- output.txt is a text file, delimited by the ^ character, consisting of 
	   variables mined from analysis of the html pages found in the standard 
	   directory of rawdata
	   
	- processoutput.txt is a text file, delimited by the ^ character, consisting of
	  variables mined from the Edmunds.com used car appraiser.  
		
	- historycleaned.txt is a text file, delimited by the ^ character, that consists
	  of the 25 highest bids in an auction, keeping only the highest bid by bidder.  
	  It was derived by querying eBay's completed listings history; it can probably
	  no longer be replicated as eBay only stores completed listings histories for a limited time.  
	  The history is only for the standard listings.
	  
	- phrasedata_new.txt is a ^ delimited text file, indicating absence or presence
	  of certain key phrases determined from a sample corpus
	  
	- softwareoutput.txt is a ^ delimited text file, indicating the number of photos
	  and type of software used on a particular webpage. 
	 
(iii) finaldata, containing the final data file ebaydatafinal.dta


B. Program Flow

 B.1. Programs operating on the raw data
 
 - Process_March17.py is a python program which operates on the raw data for
   standard listings, to produce output.txt.  It takes as additional input the text
   file processoutput.txt, which is a set of book values for various model-trim-years
   taken from Edmunds.com.  
 	
 - textanalysis.py is a python program which operates on the raw data for standard
   listings, to produce phrasedata_new.txt
 
 - software_remote.py is a python program which operates on the raw data for standard 
   listings to produce softwareoutput.txt
 
 B.2  Programs operating on intermediate data files

 - Cleaning.do is a stata files that produces the final data file from ebaydatafinal.dta:

 	(i) output.txt
 	(ii) historycleaned.txt
 	(iii) phrasedata_new.txt
 	(iv) softwareoutput.txt
 	
 - Empirics.do is a stata file that analyzes the final data file ebaydatafinal.dta 

 B.3. Helper Programs
  
  - beautifulsoup.py, a python module useful for text analysis
  


